EU AI ACT · REGULATION (EU) 2024/1689 · ARTICLE 9 · HIGH-RISK COMPLIANCE
ENFORCEMENT DEADLINE: AUG 2, 2026 · LANGGRAPH / LANGCHAIN AGENTS
ART.9
Compliance Architecture Proposal

EU AI ACT
ARTICLE 9
FOR LANGGRAPH

A technical proposal for building a continuous, documented risk management system into LangGraph and LangChain agents — satisfying all ten paragraphs of Article 9 of Regulation (EU) 2024/1689.

10
Art. 9 Paragraphs to address
4
Core compliance pillars
3
Build phases proposed
Lifecycle — not one-and-done
01 Regulatory Context

What Article 9 Actually Requires

⚖️ Regulatory Definition

Article 9 mandates that every high-risk AI system must have a continuously running, documented, and maintained risk management system spanning its full lifecycle — from design through post-market. It is not a one-time audit. It is an ongoing engineering process. Compliance deadline: August 2, 2026.

§1 — Establishment

Risk Management System Must Be Documented

A formal RMS must be established, implemented, documented and maintained. Not a single report — an operational, living system.

LangGraph has no built-in state for risk metadata or audit trails per run.
§2 — Continuous Lifecycle Process

Four-Step Iterative Loop

The system must: (a) identify/analyze risks to health, safety, fundamental rights, (b) estimate/evaluate those risks, (c) evaluate risks from post-market monitoring data, and (d) adopt targeted mitigation measures.

LangChain chains execute statelessly — no native risk context propagation between steps.
§3 — Scope Limitation

Only Mitigable Risks In Scope

Article 9 only covers risks that can be reasonably mitigated or eliminated through design, development, or adequate technical information to deployers.

Must define a risk taxonomy scoped to what the agent can actually control.
§4 — Interaction Effects

Consider Combined Requirements

Risk measures must consider interaction effects across the full set of requirements in Chapter III — not each requirement in isolation. Balance, don't over-engineer.

Multi-node LangGraph graphs create emergent behaviors — risk must be assessed at the graph level.
§5 — Residual Risk Acceptability

Residual Risk Must Be Judged Acceptable

After mitigation, the residual risk of each hazard and overall system residual risk must be judged acceptable. Requires a formal acceptability determination framework.

No LangChain/LangGraph native concept of "risk score" or "acceptability threshold" per invocation.
§6 — Mandatory Testing

Pre-Deployment Testing Against Defined Metrics

High-risk AI must be tested throughout development and before deployment. Testing must use predefined metrics and probabilistic thresholds appropriate to the intended purpose.

LangChain evals exist but aren't tied to formal regulatory risk thresholds or documented per §6 requirements.
§7 — Timing of Testing

Testing at Any Point + Pre-Launch Mandatory

Testing must happen as appropriate at any time during development, and without exception before placing the system on the market or putting it into service.

CI/CD pipelines for LangGraph agents rarely include regulatory risk testing gates.
§8 — Post-Market Monitoring

Continuous Feedback Loop from Live Data

The RMS must incorporate post-market monitoring data (per Article 61) to re-evaluate and update risks. The system is never "finished."

No standard pattern for feeding LangGraph production traces back into a risk re-evaluation loop.
§9 — Vulnerable Groups

Special Consideration: Minors and Vulnerable Users

When the system's intended purpose may impact persons under 18 or other vulnerable groups, providers must give specific consideration to adverse impact vectors.

User-segment risk profiling must be baked into the agent's invocation context.
§10 — Sectoral Integration

May Merge With Existing RMS Under Other EU Law

If the organization already has risk management processes mandated by other EU laws (finance, medical, etc.), Art. 9 requirements can be integrated into those procedures.

Opportunity: build a composable RMS layer that plugs into existing compliance frameworks.
02 Gap Analysis

Why LangGraph/LangChain Don't Natively Comply

🔴 Critical Gap

No Persistent Risk State

LangGraph state checkpointing is functional, not regulatory. There is no native concept of a "risk event," a "mitigation decision," or a "residual risk score" per graph execution.

🔴 Critical Gap

No Audit Trail Standard

LangChain callbacks log execution events, but do not produce structured documentation that satisfies Article 9 §1's requirement for a maintained risk management system.

🔴 Critical Gap

No Pre-Defined Risk Metrics

LangSmith evaluators measure performance (quality, latency, accuracy) but are not structured around Article 9 §6's "prior defined metrics and probabilistic thresholds" for risk categories.

🟡 Significant Gap

No Post-Market Feedback Loop

Production traces in LangSmith are not automatically routed into a risk re-evaluation system. §8 requires this loop to be operational, not optional.

🟡 Significant Gap

No Graph-Level Risk Composition

LangGraph nodes are individually testable, but §4 requires risk assessment at the combined application level. Multi-agent graphs have emergent risk that node-level testing misses.

🟢 Manageable

Human-in-the-Loop Hooks Exist

LangGraph has interrupt/approval node patterns. These can be mapped to Article 14 (human oversight) requirements and partly address §5 residual risk acceptability via human review gates.

03 Solution Proposal

The Proposed Compliance Architecture

💡 Core Thesis

Build a Risk Management Middleware Layer that wraps your LangGraph agent. It adds four capabilities — risk context injection, structured event logging, evaluation gates, and post-market feedback routing — without modifying your core agent graph logic. Think of it as a compliance sidecar.

High-Level System Architecture — Article 9 Compliance Layer
🏗️
Input
Risk Context Injector
🔍
Runtime
LangGraph Agent
🛡️
Intercept
Risk Event Bus
📊
Evaluate
Risk Scoring Engine
🗄️
Persist
RMS Data Store
🔁
Feedback
Post-Market Monitor
04 Core Components

Four Pillars of the Solution

🏷️
Risk Context State
Addresses §1, §2, §3, §9

Extend LangGraph's StateGraph schema with a mandatory risk_context field. This carries risk metadata — user segment, hazard classes, active mitigations, invocation purpose — through every node. Makes risk a first-class citizen of agent state, not an afterthought.

📝
Structured RMS Logger
Addresses §1, §2d, §7, §8

A LangChain callback handler that intercepts every node transition, tool call, and LLM invocation, emitting structured risk events to a persistent store. Events include hazard ID, mitigation applied, residual risk score, and operator decision. Produces the documented audit trail Article 9 demands.

🧪
Regulatory Eval Suite
Addresses §6, §7

A pre-deployment test suite built on LangSmith evaluators, structured around pre-defined risk metrics and probabilistic thresholds (not just accuracy). Tests include: harmful output rate, fundamental-rights proxy scores, bias probes, and consistency-under-distribution-shift. Blocks deployment if thresholds breached.

🔁
Post-Market Monitor
Addresses §2c, §8

A production feedback pipeline that ingests LangSmith traces, flags anomalous risk events, and routes them back to the risk identification step. Triggers automatic risk re-evaluation when drift is detected. Closes the lifecycle loop that §2(c) and §8 explicitly require.

05 Technical Specification

Component-Level Build Specification

RiskStateSchema

LangGraph · State Extension

Extend TypedDict-based LangGraph state with required fields: risk_context (hazard_classes, user_segment, intended_purpose), active_mitigations (list of applied controls), risk_events (append-only list), residual_risk_score (float, updated by scoring node). Add a dedicated risk_assessment_node that runs at graph entry and after any tool call with side effects.

TypedDict StateGraph Annotated[list, operator.add]
Art. 9 §1 Art. 9 §2 Art. 9 §9

RiskEventCallback

LangChain · BaseCallbackHandler

Subclass BaseCallbackHandler (or AsyncCallbackHandler). Override on_llm_end, on_tool_end, on_chain_end to emit structured RiskEvent objects to a message queue or database. Each event carries: timestamp, node_id, hazard_id, mitigation_applied, residual_score, run_id (for traceability). Store events in append-only log (PostgreSQL / DynamoDB / BigQuery) for regulator access.

BaseCallbackHandler RiskEvent(dataclass) append-only log
Art. 9 §1 Art. 9 §7 Art. 12

HazardTaxonomy

Config · Risk Classification

A YAML/JSON-defined taxonomy of hazard classes relevant to your agent's domain. Each hazard entry carries: hazard_id, description, fundamental_rights_vector (which EU rights could be affected), likelihood_prior, severity_rating, and linked mitigation controls. This taxonomy is the documented output of the §2(a) "identification and analysis" step and must be versioned alongside code.

hazard_id: HAZ-001 severity: HIGH mitigations: [MIT-003]
Art. 9 §2a Art. 9 §2b Art. 9 §3

ResidualRiskScorer

LangGraph · Risk Node

A graph node (or callable injected into the should_continue condition) that computes a composite residual risk score after mitigation controls have been applied. Compares score against pre-defined acceptability thresholds (set per intended purpose per §6). If residual risk exceeds threshold, routes to a human_review_node (interrupt) before execution continues. Satisfies §5 residual risk acceptability requirement.

conditional_edge interrupt() risk_threshold.yaml
Art. 9 §4 Art. 9 §5 Art. 14

RegulatoryEvalSuite

LangSmith · CI/CD Gate

A LangSmith evaluator suite with custom metrics tied to the hazard taxonomy: (1) HarmfulOutputRate — proportion of runs triggering HAZ-class events, (2) FundamentalRightsProxy — LLM-graded assessment of output against a rights rubric, (3) BiasProbe — structured demographic parity tests, (4) ConsistencyUnderDrift — re-run with distribution-shifted inputs. Suite is run in CI/CD. Deployment is blocked if thresholds breach. Results are documented as §6/§7 evidence.

run_on_dataset() EvaluationResult threshold_gate.py
Art. 9 §6 Art. 9 §7 Art. 15

PostMarketMonitor

Async Pipeline · §8 Loop

An async service (cron-based or event-driven) that pulls production RiskEvent logs, aggregates anomaly signals, and triggers a risk re-evaluation run when drift is detected — e.g. HarmfulOutputRate exceeds baseline by >2σ. Writes a timestamped risk re-evaluation report to the RMS datastore, closing the §2(c)/§8 feedback loop. Implements the Article 9 requirement that the RMS is "regularly reviewed and updated."

drift_detector.py anomaly_threshold risk_re_eval_report
Art. 9 §2c Art. 9 §8 Art. 72
06 Reference Implementation

Skeleton: Risk-Aware LangGraph State

# risk_state.py — Article 9 §1, §2 compliant state schema from typing import Annotated, TypedDict, List import operator from dataclasses import dataclass, field from datetime import datetime @dataclass class RiskEvent: hazard_id: str # e.g. "HAZ-001-BIAS" triggered_at: datetime node_id: str mitigation_applied: str # e.g. "MIT-003-FILTER" residual_score: float # 0.0 (no risk) → 1.0 (max risk) operator_reviewed: bool = False @dataclass class RiskContext: intended_purpose: str user_segment: str # e.g. "general_public", "vulnerable_minor", "professional" hazard_classes: List[str] = field(default_factory=list) active_mitigations: List[str] = field(default_factory=list) class Article9State(TypedDict): # Your existing agent state fields: messages: Annotated[list, operator.add] # Article 9 §1 — documented risk management fields: risk_context: RiskContext risk_events: Annotated[List[RiskEvent], operator.add] # append-only log residual_risk_score: float # §5 — must be judged acceptable risk_acceptance_status: str # "pending" | "accepted" | "rejected" rms_run_id: str # links to RMS datastore record
# risk_callback.py — Article 9 §1 documented maintenance, §7 pre-deployment evidence from langchain_core.callbacks import BaseCallbackHandler from langchain_core.outputs import LLMResult class RiskEventCallback(BaseCallbackHandler): def __init__(self, rms_store, hazard_classifier): self.rms_store = rms_store self.hazard_classifier = hazard_classifier async def on_llm_end(self, response: LLMResult, **kwargs): # Classify output against hazard taxonomy (Art. 9 §2a) hazards = await self.hazard_classifier.classify(response.generations) for hazard in hazards: event = RiskEvent( hazard_id=hazard.id, triggered_at=datetime.utcnow(), node_id=kwargs.get("run_id"), mitigation_applied=hazard.auto_mitigation, residual_score=hazard.residual_score, ) # Persist to append-only RMS store — satisfies §1 "maintained" requirement await self.rms_store.append_event(event)
07 Compliance Mapping

Article 9 Paragraph → Solution Component Mapping

Article 9 Paragraph Requirement Summary Solution Component
§1 — Establishment Documented, implemented and maintained RMS RiskEventCallback + RMS DataStore
§2(a) — Identify Identify/analyze known and foreseeable risks HazardTaxonomy (versioned YAML)
§2(b) — Estimate Estimate and evaluate risks ResidualRiskScorer node
§2(c) — Post-Market Evaluate risks from post-market monitoring data PostMarketMonitor pipeline
§2(d) — Mitigate Adopt targeted risk management measures RiskContext.active_mitigations + HazardTaxonomy mitigations
§3 — Scope Only mitigable risks in scope HazardTaxonomy scope definition + risk classification filter
§4 — Interaction Effects Consider combined application of requirements Graph-level RiskScorer (not node-level); composite scoring formula
§5 — Residual Risk Residual risk must be judged acceptable ResidualRiskScorer + acceptability_threshold.yaml + human interrupt
§6 — Testing Test against predefined metrics and thresholds RegulatoryEvalSuite (LangSmith)
§7 — Testing Timing Test throughout dev; mandatory before deployment CI/CD gate blocking deployment on threshold breach
§8 — Continuity Regularly reviewed, updated with live data PostMarketMonitor + drift-triggered re-evaluation
§9 — Vulnerable Groups Consider impact on minors and vulnerable users RiskContext.user_segment field + segment-specific hazard weights
§10 — Sectoral Integration May integrate with existing EU-law RMS RMS DataStore designed as composable — plugs into DORA/MDR/GDPR audit systems
08 Roadmap

Three-Phase Build Plan

01Weeks 1–4

Foundation — Risk State & Taxonomy

Extend your LangGraph StateGraph with Article9State. Author the HazardTaxonomy YAML for your specific agent domain — mapping hazard classes to fundamental rights vectors, likelihood priors, and mitigation controls. Wire up RiskEventCallback to an append-only event store. At the end of Phase 1, your agent produces a structured risk event log on every run. This alone satisfies §1 (documented, maintained) and §2(a/b) (identification/estimation).

Article9State schema HazardTaxonomy v1.0 RiskEventCallback RMS DataStore schema
02Weeks 5–8

Enforcement — Scoring, Thresholds & Eval Gates

Build the ResidualRiskScorer node and wire it into your graph's conditional edges. Define acceptability thresholds per hazard class in risk_thresholds.yaml. Build the RegulatoryEvalSuite in LangSmith with HarmfulOutputRate, FundamentalRightsProxy, and BiasProbe metrics. Integrate the suite as a required CI/CD gate — deployment to production is blocked if any threshold is breached. Document all test runs as §6/§7 evidence artifacts.

ResidualRiskScorer node risk_thresholds.yaml RegulatoryEvalSuite CI/CD deployment gate
03Weeks 9–12

Lifecycle Closure — Post-Market Monitoring & Documentation

Deploy the PostMarketMonitor as an async service. Configure drift detection thresholds per hazard class. Build the risk re-evaluation pipeline that triggers when drift is detected and writes timestamped re-evaluation reports to the RMS store. Generate a technical documentation bundle (Article 11) from RMS data — this is your compliance evidence package. Finally, validate §10 integration if other sectoral EU law applies: map RMS events into GDPR DPIA, DORA ICT risk log, or MDR QMS as appropriate.

PostMarketMonitor service drift_detector.py RMS Documentation Bundle §10 sectoral integration
📌 Scope Note

This proposal addresses Article 9 in isolation. A full high-risk AI system compliance program also requires: Article 10 (data governance), Article 11 (technical documentation), Article 12 (automatic logging), Article 13 (transparency), Article 14 (human oversight), and Article 15 (accuracy/cybersecurity). The components proposed here — particularly RiskEventCallback, the RMS DataStore, and the RegulatoryEvalSuite — are intentionally designed as foundations that future Articles 10–15 work can build on. This is not legal advice.